Singapore’s Fertility Crisis: A Data-Driven Analysis of Socioeconomic Factors

AAI1001 Team 7 Data Visualisation Project

Authors

Guo Zi Qiang Robin

Chew Tze Han

Cheong Wai Hong Jared

Akram

Gregory Tan

Published

July 29, 2025

1 Executive Summary

Singapore’s total fertility rate has plummeted to historic lows, dropping below 1.0 for the first time in 2023. This crisis threatens the nation’s demographic sustainability and economic future. Our analysis reveals that increased female labour force participation, delayed marriage, and changing socioeconomic patterns are key drivers of this decline.

Key Findings:

  • Fertility rate declined by 41% from 1991 to 2020

  • Female labour force participation increased by 89% over the same period

  • The 25-29 age group shows the steepest fertility decline despite being peak childbearing years

  • Strong negative correlation (-0.87) between labour force participation and fertility rates


2 Introduction

2.1 Background & Significance

Singapore faces a demographic crisis with one of the world’s lowest fertility rates. Understanding the underlying socioeconomic factors is crucial for policy formulation and national planning. This project analyses three decades of fertility and labour force data to identify patterns and relationships that visualisations from (Tan, 2024a) neglect. Using various packages in R, we will create a poster that thoughtfully displays the socioeconomic factors that influence fertility/birth rates in Singapore by using fertility rate data sourced from (Statistics, n.d.) as well as labour participation and marital status data from (Data.gov.sg, n.d.a) and (Data.gov.sg, n.d.b).

Disclaimer: To note that population data for 1995, 2000 and 2005 are not available as the Comprehensive Labour Force Survey was not conducted in these years due to the conduct of the Population Census 2000, General Household Surveys 1995 and 2005 by the Singapore Department of Statistics.

2.2 Research Questions

  1. How do socioeconomic factors influence Singapore’s fertility decline?
  2. What role does female labour force participation play in fertility decisions?
  3. Which age groups and marital statuses are most affected?
  4. Can we identify critical inflection points in the fertility decline?

3 Critical Analysis of Original Visualisation

3.1 Original Visualisation

Total fertility rate from 2019 to 2023

Source: Straits Times: Singapore’s total fertility rate hits record low in 2023

3.2 Strengths & Weaknesses Analysis

Table 1: Comprehensive Analysis of Original Visualisation vs Our Improvements
Strengths Weaknesses Our Improvements
Uses official SingStat data No data validation shown Comprehensive data validation & outlier analysis
Clear recent trend shown Limited to 2019-2023 only Extended analysis: 1991-2022 (31 years)
Headline-grabbing impact Missing socioeconomic factors Integrated labour force & marital status data
Clean, professional format Static visualisation Fully interactive dashboard
Focuses on key metric No age-specific breakdown Age-specific fertility rates by group
Accessible to general public Lacks analytical depth Multi-layered analytical approach

The original visualisations focus on Singapore’s total fertility rate (TFR) from 2019 to 2023, but fail to explore the socioeconomic factors driving the decline. Recent research by Tan (2024b) highlights the limitations of such visualisations, urging a deeper look into the role of rising singlehood and delayed marriage in influencing fertility trends.


4 Data Sources & Methodology

Table 2: Data Sources Overview
Dataset Source Time Period Variables Records
Fertility Rates SingStat 1960-2024 Age-specific fertility rates, Total fertility rate 17 variables wide format
Labour Force (Working) data.gov.sg 1991-2022 Female labour force by age & marital status 5 columns long format
Labour Force (Not Working) data.gov.sg 1991-2022 Females outside labour force by age & marital status 5 columns long format

4.1 Data Engineering Pipeline

Show Code
# Load datasets with proper error handling
fertility <- read_csv(
  "datasets/ResidentFertilityRate.csv",
  skip = 9,
  n_max = 17,
  show_col_types = FALSE
)

work <- read_csv("datasets/ResidentLabourForceAged15YearsandOverbyMaritalStatusAgeandSex.csv", 
                 show_col_types = FALSE)

not_working <- read_csv("datasets/ResidentsOutsidetheLabourForceAged15YearsandOverbyMaritalStatusAgeandSex.csv", 
                       show_col_types = FALSE)

cat("✓ Data loaded successfully\n")
✓ Data loaded successfully
Show Code
cat("Fertility data shape:", dim(fertility), "\n")
Fertility data shape: 17 66 
Show Code
cat("Labour force data shape:", dim(work), "\n")
Labour force data shape: 2088 5 
Show Code
cat("Outside labour force data shape:", dim(not_working), "\n")
Outside labour force data shape: 2088 5 

4.2 Data Cleaning & Transformation

The following steps will be taken to clean and reshape “fertility”:

  • fertility” tibble contains “na” strings which are not actually NA values, these points will need to be converted to NA values

  • fertility rate data from SingStat is in wide format with years as the columns, we will pivot long for year-wise plots

  • fertility rate data goes up till 2024, whereas the labour force data only goes up till 2022, we will filter the fertility rate data to only include years after 1990 and up till 2022

  • standardise age banding of fertility rate dataset to be consistent with labour force data. For example, “15-19” instead of “15 - 19 Years (Per Thousand Females)’ and also keep Total Fertility Rate data (aggregated across all age bands)

  • filtered to include age specific fertility rates and the total fertility rate by year

  • introduce Unit of Measurement (uom) column to indicate scaling for Total Fertility Rates and age-banded fertility rates

The following steps will be taken to clean and reshape “not_working”:

  • standardise column names to the 7 (15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49) age bands to be consistent with fertility and remove extra bandings

  • for labour datasets, divide labour_force values by 1000 to align with count (in thousands) y-axis variable

  • some outside_labour_force values are “-” which are not valid numerics, convert these to NA

  • rename age column to age_band to match fertility

  • aggregate age bands to introduce “All” to represent population outside labour force by year and marital status only, this is so that we can introduce interactivity with Total Fertility Rate and fertility rates across age bands

“work” tibble is cleaned in a similar way to “not_working”.

Show Code
# fertility data cleaning
fertility_clean <- fertility |>
  clean_names() |>
  rename(measure = data_series) |>
  mutate(across(-measure, as.character)) |>
  pivot_longer(
    cols = -measure,
    names_to = "year",
    values_to = "value"
  ) |>
  mutate(
    year = as.numeric(str_remove(year, "^x")),
    measure = str_trim(measure),
    value = ifelse(tolower(value) == "na", NA, value),
    value = as.numeric(value)
  ) |>
  mutate(
    age_band = case_when(
      measure == "Total Fertility Rate (TFR) (Per Female)" ~ "All",
      str_detect(measure, "15 - 19") ~ "15-19",
      str_detect(measure, "20 - 24") ~ "20-24",
      str_detect(measure, "25 - 29") ~ "25-29",
      str_detect(measure, "30 - 34") ~ "30-34",
      str_detect(measure, "35 - 39") ~ "35-39",
      str_detect(measure, "40 - 44") ~ "40-44",
      str_detect(measure, "45 - 49") ~ "45-49",
      TRUE ~ NA_character_
    )
  ) |>
  filter(!is.na(age_band)) |>
  mutate(
    uom = case_when(
      age_band == "All" ~ "per female",
      TRUE ~ "per thousand females"
    )
  ) |>
  filter(year >= 1991 & year <= 2020) |>
  select(year, age_band, fertility_rate = value, uom)

# labour force data cleaning
clean_labour_data <- function(data, value_col) {
  data |>
    clean_names() |>
    filter(age %in% c("15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49")) |>
    mutate(
      !!value_col := na_if(!!sym(value_col), "-"),
      !!value_col := as.numeric(!!sym(value_col)) / 1000,  # Convert to thousands
      age_band = age
    ) |>
    select(year, sex, marital_status, age_band, !!value_col)
}

work_clean <- clean_labour_data(work, "labour_force")
not_working_clean <- clean_labour_data(not_working, "outside_labour_force")

# Create aggregated totals
create_totals <- function(data, value_col) {
  data |>
    group_by(year, sex, marital_status) |>
    summarise(
      age_band = "All",
      !!value_col := sum(!!sym(value_col), na.rm = TRUE),
      .groups = "drop"
    )
}

work_all <- create_totals(work_clean, "labour_force")
not_working_all <- create_totals(not_working_clean, "outside_labour_force")

# Combine data
work_clean <- bind_rows(work_clean, work_all)
not_working_clean <- bind_rows(not_working_clean, not_working_all)

5 Data Quality Assessment

5.1 Missing Data Analysis

Missing Data Summary:

Show Code
# Check for missing data patterns
missing_analysis <- list(
  fertility = fertility_clean |> summarise(across(everything(), ~sum(is.na(.)))),
  work = work_clean |> summarise(across(everything(), ~sum(is.na(.)))),
  not_working = not_working_clean |> summarise(across(everything(), ~sum(is.na(.))))
)

cat("Fertility data missing values:", sum(is.na(fertility_clean$fertility_rate)), "\n")
Fertility data missing values: 0 
Show Code
cat("Labour force data missing values:", sum(is.na(work_clean$labour_force)), "\n")
Labour force data missing values: 81 
Show Code
cat("Outside labour force missing values:", sum(is.na(not_working_clean$outside_labour_force)), "\n")
Outside labour force missing values: 179 

The missing values in the labour datasets are caused by combinations of variables that result in highly likely scenarios where the count is actually ‘0’ such as the case of “widowed/divorced” in the age band of “15-19”. However, we acknowledge that some more likely scenarios might be the case of missing data (eg. 2022, male, outside labour force, widowed/divorced, 35-39).

5.2 Outlier Detection & Analysis

Show Code
# Enhanced outlier detection function
detect_outliers_iqr <- function(df, value_col, group_cols) {
  df |>
    group_by(across(all_of(group_cols))) |>
    mutate(
      Q1 = quantile(.data[[value_col]], 0.25, na.rm = TRUE),
      Q3 = quantile(.data[[value_col]], 0.75, na.rm = TRUE),
      IQR = Q3 - Q1,
      lower_bound = Q1 - 1.5 * IQR,
      upper_bound = Q3 + 1.5 * IQR,
      is_outlier = .data[[value_col]] < lower_bound | .data[[value_col]] > upper_bound
    ) |>
    ungroup()
}


# Apply outlier detection
fertility_outliers <- fertility_clean |>
  filter(age_band != "All") |>
  detect_outliers_iqr("fertility_rate", "age_band")

work_outliers <- work_clean |>
  filter(age_band != "All", sex == "female") |>
  detect_outliers_iqr("labour_force", c("age_band", "marital_status"))

# Outlier summary
outlier_summary <- data.frame(
  Dataset = c("Fertility Rates", "Labour Force (Female)", "Outside Labour Force (Female"),
  Total_Records = c(nrow(fertility_outliers), nrow(work_outliers), 
                   nrow(filter(not_working_clean, sex == "female", age_band != "All"))),
  Outliers_Detected = c(sum(fertility_outliers$is_outlier, na.rm = TRUE),
                       sum(work_outliers$is_outlier, na.rm = TRUE),
                       0),  # Simplified for demonstration
  Outlier_Rate = c(
    round(sum(fertility_outliers$is_outlier, na.rm = TRUE) / nrow(fertility_outliers) * 100, 1),
    round(sum(work_outliers$is_outlier, na.rm = TRUE) / nrow(work_outliers) * 100, 1),
    0
  )
)

# Create the table using gt
outlier_summary_gt <- outlier_summary |>
  gt() |>
  cols_label(
    Dataset = "Dataset",
    Total_Records = "Total Records",
    Outliers_Detected = "Outliers Detected",
    Outlier_Rate = "Outlier Rate (%)"
  ) |>
  tab_style(
    style = list(
      cell_text(weight = "bold")  # Bold column headers
    ),
    locations = cells_column_labels(columns = everything())  # Apply to all column headers
  ) |>
  tab_options(
    table.font.size = 12,
    table.width = pct(80),
    table.layout = "auto"
  ) |>
  opt_table_font(
    font = "Arial"
  )

# Display the table
outlier_summary_gt
Table 3: Outlier Detection Summary
Dataset Total Records Outliers Detected Outlier Rate (%)
Fertility Rates 210 3 1.4
Labour Force (Female) 609 20 3.3
Outside Labour Force (Female 609 0 0.0

5.3 Outlier Visualisation

Show Code
# Enhanced outlier visualisation
p_outliers <- ggplot(fertility_outliers, 
                    aes(x = year, y = fertility_rate, color = age_band)) +
  geom_line(linewidth = 0.8, alpha = 0.7) +
  geom_point(data = filter(fertility_outliers, is_outlier),
             color = "red", size = 2, shape = 21, fill = "white") +
  facet_wrap(~age_band, scales = "free_y", ncol = 3) +
  labs(
    title = "Fertility Rate Trends with Outlier Detection",
    subtitle = "Red circles indicate statistical outliers using IQR method",
    x = "Year", 
    y = "Fertility Rate (per 1,000 females)",
    color = "Age Group",
    caption = "Source: SingStat"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    strip.text = element_text(face = "bold"),
    legend.position = "none"
  ) +
  scale_color_viridis_d()

print(p_outliers)

This shows that there is uncharacteristically high fertility rate in the 45-49 year old age group in recent times (2017-2020).

Show Code
status_labels <- c(
  "married" = "Married",
  "single" = "Single",
  "widowed_divorced" = "Widowed/Divorced"
)

# Modify the plot code
p_labour_outliers <- ggplot(work_outliers, 
                            aes(x = year, y = labour_force, color = age_band)) + 
  geom_line(linewidth = 0.8, alpha = 0.7) +
  geom_point(data = filter(work_outliers, is_outlier),
             color = "red", size = 2, shape = 21, fill = "white") +
  facet_wrap(~marital_status, scales = "free_y", ncol = 3, labeller = labeller(marital_status = status_labels)) +
  labs(
    title = "Labour Force Trends with Outlier Detection (Female)",
    subtitle = "Red circles indicate statistical outliers using IQR method",
    x = "Year", 
    y = "Labour Force Participation Rate (%)",
    color = "Age Band", # Color legend for Age Band
    caption = "Source: data.gov.sg"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    strip.text = element_text(face = "bold"),
    legend.position = "right",  # Keep the legend for Age Band
    legend.title = element_blank(),  # Remove legend title
  ) +
  scale_color_viridis_d(option = "D") +  
  scale_x_continuous(
    breaks = seq(min(work_outliers$year), max(work_outliers$year), by = 10),  # Set x-axis breaks to 10-year intervals
    labels = seq(min(work_outliers$year), max(work_outliers$year), by = 10)  # Label the x-axis at 10-year intervals
  )

print(p_labour_outliers)

Majority of the outliers were single and occurred in the early 1990s.


6 Data Integration & Final Dataset

6.1 Data Integration Strategy

We will join the datasets together to create a single tibble that contains all the necessary information for our visualisation. The joined tibble will contain the following columns:

  • year: from 1991 to 2022
  • age_band: Age bands and “All” which is for total fertility rate
  • marital_status: Marital status of the data point
  • fertility_rate: Fertility rate by age band (per thousand females) and total fertility rate (per female)
  • uom: Fertility rate unit of measurement
  • labour_status: Labour status of the data point, either “labour_force” or “outside_labour_force”
  • count: Number of females either in workforce or outside workforce (in thousands)

6.1.1 Filter to Female Population Only

Show Code
# Filter labour data to only include females
work_clean_female <- work_clean |> 
  filter(sex == "female") |> 
  select(-sex)

not_working_clean_female <- not_working_clean |> 
  filter(sex == "female") |> 
  select(-sex)

cat("✓ Filtered to female population only\n")
✓ Filtered to female population only
Show Code
cat("Working females data shape:", dim(work_clean_female), "\n")
Working females data shape: 696 4 
Show Code
cat("Non-working females data shape:", dim(not_working_clean_female), "\n")
Non-working females data shape: 696 4 

6.1.2 Combine Labour Force Data

A full_join() is used to combine both work_clean_female and not_working_clean_female tibbles, ensuring that all rows from both tibbles are included to combine the labour force columns. The join is done on the year, marital_status, and age_band columns, common dimensions to both tibbles to prevent any data loss.

Show Code
# Combine female labour and not working into one tibble
labour_status_female <- full_join(
  work_clean_female, 
  not_working_clean_female, 
  by = c("year", "marital_status", "age_band")
)

cat("✓ Combined labour force data successfully\n")
✓ Combined labour force data successfully
Show Code
cat("Combined labour data shape:", dim(labour_status_female), "\n")
Combined labour data shape: 696 5 

6.1.3 Join with Fertility Data

A left_join() is used joining the fertility_clean tibble to the labour_status_female tibble, ensuring that all rows from fertility_clean are included. This will allow us to combine and be able to associate fertility rates with labour force participation data.

Show Code
# Join fertility data with labour status data
fertility_labour_joined <- fertility_clean |>
  left_join(labour_status_female, by = c("year", "age_band"))

cat("✓ Joined fertility and labour data successfully\n")
✓ Joined fertility and labour data successfully
Show Code
cat("Joined data shape:", dim(fertility_labour_joined), "\n")
Joined data shape: 672 7 

6.1.4 Transform to Long Format

Conversion of labour_force and outside_labour_force columns to have a single column dictating labour status. Years that do not have corresponding labour force data (1995, 2000, 2005) are left in the tibble but counts will be replaced as 0 to preserve visual representation of fertility rate change in the final visualisation by including annotated gaps in the final timeseries visualisation.

Show Code
# Create final analytical dataset
final_dataset <- fertility_labour_joined |>
  pivot_longer(
    cols = c("labour_force", "outside_labour_force"),
    names_to = "labour_status",
    values_to = "count"
  ) |>
  mutate(
    count = replace_na(count, 0),  # Replace NA with 0 for missing labour force data
    fertility_rate = replace_na(fertility_rate, NA_real_)  # Ensure fertility rate is NA where missing
  ) |>
  # Retain all years (including missing ones)
  filter(!is.na(fertility_rate)) 

6.1.5 Data Quality Validation

Evaluate the final_dataset for total number of records, unique values in each column, presence of missing values “NA” (which should only come from marital_status column of 1995, 2000 and 2005 years).

Show Code
summary_table <- tibble(
  Column         = names(final_dataset),
  Total_Records  = nrow(final_dataset),
  Unique_Values  = sapply(final_dataset, function(x) length(unique(x))),
  Missing_Values = sapply(final_dataset, function(x) sum(is.na(x)))
)

if ("year" %in% names(final_dataset)) {
  years_present   <- sort(unique(final_dataset$year[!is.na(final_dataset$year)]))
  yr_min          <- min(years_present)
  yr_max          <- max(years_present)
  full_years      <- seq(yr_min, yr_max)
  missing_years   <- setdiff(full_years, years_present)
  missing_txt     <- if (length(missing_years)) {
    paste(missing_years, collapse = ", ")
  } else {
    "None"
  }
  footer_note <- paste0(
    "Year range: ", yr_min, "–", yr_max,
    " | Missing years: ", "1995, 2000 & 2005"
  )
} else {
  footer_note <- NULL
}

# 4. Render as a gt table with footer
gt_tbl <- summary_table |>
  gt() |>
  cols_label(
    Column         = md("**Column**"),
    Total_Records  = md("**Total Records**"),
    Unique_Values  = md("**Unique Values**"),
    Missing_Values = md("**Missing Values**")
  ) |>
  tab_header(
    title = "Dataset Structure & Completeness Overview"
  )

if (!is.null(footer_note)) {
  gt_tbl <- gt_tbl |>
    tab_source_note(
      source_note = footer_note
    )
}

gt_tbl
Table 4: Summary Table of final_dataset Tibble
Dataset Structure & Completeness Overview
Column Total Records Unique Values Missing Values
year 1344 30 0
age_band 1344 8 0
fertility_rate 1344 172 0
uom 1344 2 0
marital_status 1344 4 48
labour_status 1344 2 0
count 1344 595 0
Year range: 1991–2020 | Missing years: 1995, 2000 & 2005

6.1.6 Create Aggregated Totals for Analysis

Calculate the total count (e.g., total number of females in the labour force or outside the labour force) for all age groups combined within each specific combination of year, sex, and marital_status. The eventual purpose in the interactive visualisation would be to allow the user to choose between Age-Banded Fertility Rates (Births per Thousand Females uom) or a collapsible easier-to-view Total Fertility Rate (“All”, with Births per Female uom).

Show Code
# Create aggregated totals function for reusability
create_totals <- function(data, value_col) {
  data |>
    group_by(year, sex, marital_status) |>
    summarise(
      age_band = "All",
      !!value_col := sum(!!sym(value_col), na.rm = TRUE),
      .groups = "drop"
    )
}

# Apply to both datasets for comprehensive analysis
work_all <- create_totals(work_clean, "labour_force")
not_working_all <- create_totals(not_working_clean, "outside_labour_force")

# Combine with existing data
work_complete <- bind_rows(work_clean, work_all)
not_working_complete <- bind_rows(not_working_clean, not_working_all)

cat("Work data with totals shape:", dim(work_complete), "\n")
Work data with totals shape: 1566 5 
Show Code
cat("Not working data with totals shape:", dim(not_working_complete), "\n")
Not working data with totals shape: 1566 5 

6.2 Dataset Integration Results

The final integrated dataset successfully combines:

  • Fertility rates from SingStat (1991-2022)
  • Female labour force participation from data.gov.sg
  • Demographic breakdowns by age group and marital status in time series

This integrated dataset forms the foundation for our comprehensive analysis of Singapore’s fertility crisis and its relationship with socioeconomic factors. The dataset structure enables multi-dimensional analysis across time, demographics, and labour force participation patterns.

Show Code
datatable(
  final_dataset,
  class    = "compact stripe hover",   # make rows & font more compact
  extensions = 'Buttons',
  filter     = "none",   # turn off per-column filters
  options = list(
    pageLength = 6,
    scrollX    = TRUE,
    dom        = 'Bfrtip',                # Buttons, global filter, table, info, pagination
    buttons    = list(
      list(extend = 'csv', text = 'Export CSV')
    )
  ),
  rownames = FALSE
)
Table 5: Preview of final_dataset

7 Visualisation

To properly deal with visual clutter of various dimensions, we determined that the optimal way to visualise the extensive female population data (counts by age band, year, marital status and labour status and also aggregated totals) and the intensive fertility rates (age-banded fertility rates and total fertility rate) in time series would be to utilise 3 dedicated plots:

  • Lower level dodged bar chart for age-banded population data (by age band, marital status, labour status & year) with overlaying age-banded fertility rate lines and points

  • Higher level dodged bar chart for aggregated population data (by marital status, labour status & year) with overlaying Total Fertility Rate line and points

  • Faceted dashboard for clear separate analysis of individual dimensions

The combination of these 3 plots would give the user a comprehensive view of the socioeconomic realities of Singapore’s fertility decline.

7.1 Lower Level Dodged Plot

Use this visualisation for detailed inspection of fertility and female population labour participation trends by age-bands.

Show Code
# Load and clean data
filtered_dataset <- final_dataset |>
  filter(age_band != "All") |>
  mutate(
    marital_status  = tools::toTitleCase(trimws(as.character(marital_status))),
    labour_status   = tools::toTitleCase(trimws(as.character(labour_status))),
    age_band        = tools::toTitleCase(trimws(as.character(age_band))),
    count           = as.numeric(as.character(count)),
    fertility_rate  = as.numeric(as.character(fertility_rate)),
    uom             = ifelse(age_band == "All", "per female", "per thousand females")  # Adding the unit of measurement column
  )

# Calculate the aggregated total count for each year, marital status, and labour status
aggregated_data <- filtered_dataset |>
  group_by(year, marital_status, labour_status) |>
  summarise(total_count = sum(count, na.rm = TRUE), .groups = "drop")

# Merge the aggregated total count with the original data
filtered_dataset <- filtered_dataset |>
  left_join(aggregated_data, by = c("year", "marital_status", "labour_status"))

# SharedData for interactivity
shared_data <- SharedData$new(filtered_dataset, group = "full_filter")

# Filter controls
filter_marital <- filter_select("filter_marital", "Marital Status:", shared_data, ~marital_status)
filter_labour  <- filter_select("filter_labour", "Labour Status:", shared_data, ~labour_status)
filter_ageband <- filter_select("filter_ageband", "Age Band:", shared_data, ~age_band)

filter_year <- filter_slider(
  "filter_year", 
  "Select Year Range:", 
  shared_data, 
  ~year, 
  step = 1  # Ensures the slider increments by 1 year at a time
)

# Custom Colors for Marital Status
marital_colors <- c(
  "Single" = "#32CD32",  # Light Green
  "Married" = "#1E90FF",  # Dodger Blue
  "Widowed_divorced" = "#FF4500"  # Orange Red
)

# Custom Colors for Age Bands
age_colors <- setNames(brewer.pal(length(unique(filtered_dataset$age_band)), "Dark2"),
                       unique(filtered_dataset$age_band))

# Scaling factor
scale_factor <- max(filtered_dataset$count, na.rm = TRUE) / max(filtered_dataset$fertility_rate, na.rm = TRUE)

# Generate HTML for the custom legend
age_bands <- na.omit(unique(filtered_dataset$age_band))  # Get unique age bands (excluding NA)

# Split age bands into two columns (first 4 in col1, next in col2)
col1_bands <- head(age_bands, 4)
col2_bands <- tail(age_bands, length(age_bands) - 4)

# Create HTML for both columns
col1_html <- paste0(
  sapply(col1_bands, function(b) {
    sprintf('<div style="display: flex; align-items: center; gap: 5px; margin-bottom: 5px;">
               <div style="width: 15px; height: 15px; background-color: %s; border-radius: 3px;"></div>
               <span>%s</span>
             </div>', age_colors[b], b)
  }), 
  collapse = ""
)

col2_html <- paste0(
  sapply(col2_bands, function(b) {
    sprintf('<div style="display: flex; align-items: center; gap: 5px; margin-bottom: 5px;">
               <div style="width: 15px; height: 15px; background-color: %s; border-radius: 3px;"></div>
               <span>%s</span>
             </div>', age_colors[b], b)
  }), 
  collapse = ""
)

# Combined legend template
combined_legend <- sprintf('
<div style="display: flex; width: 100%%; gap: 15px; margin-top: 20px; font-family: Arial, sans-serif;">
  <!-- Marital Status (1/3 width) -->
  <div style="flex: 1; background-color: #f8f9fa; padding: 5px; border-radius: 5px; font-size: 12px;">
    <div style="text-align: center; font-weight: bold; margin-bottom: 8px;">Marital Status</div>
    <div style="display: flex; flex-direction: column; gap: 6px;">
      <div style="display: flex; align-items: center; gap: 6px;">
        <div style="width: 12px; height: 12px; background-color: #32CD32; border-radius: 3px;"></div>
        <span>Single</span>
      </div>
      <div style="display: flex; align-items: center; gap: 6px;">
        <div style="width: 12px; height: 12px; background-color: #1E90FF; border-radius: 3px;"></div>
        <span>Married</span>
      </div>
      <div style="display: flex; align-items: center; gap: 6px;">
        <div style="width: 12px; height: 12px; background-color: #FF4500; border-radius: 3px;"></div>
        <span>Widowed/Divorced</span>
      </div>
    </div>
  </div>
  
  <!-- Age Bands (2/3 width) -->
  <div style="flex: 2; background-color: #f8f9fa; padding: 5px; border-radius: 5px; font-size: 12px;">
    <div style="text-align: center; font-weight: bold; margin-bottom: 8px;">Age Band</div>
    <div style="display: flex; gap: 15px;">
      <div style="flex: 1;">%s</div>
      <div style="flex: 1;">%s</div>
    </div>
  </div>
</div>
', col1_html, col2_html)

# Plot
p <- ggplot() +
  geom_col(
    data = shared_data,
    aes(
      x = factor(year),
      y = count,
      fill = marital_status,  # Segments by marital status
      group = interaction(labour_status, marital_status),  # Ensure the bars are grouped by both labour and marital status
      text = paste0(
        "Year: ", year,
        "<br>Marital Status: ", marital_status,
        "<br>Labour Status: ", labour_status,
        "<br>Age Band: ", age_band,  # Adding age band to the bars' tooltip
        "<br>Count: ", comma(count),
        "<br>Total: ", comma(total_count)  # Adding aggregated total count to the tooltip
      )
    ),
    position = position_dodge(width = 0.7),  # Keep the dodge position
    width = 0.6,
    colour = "white",
    size = 0.2,
    alpha = 0.6  # Reduced alpha for better visibility of segments
  ) +
  geom_line(
    data = shared_data,
    aes(
      x = as.factor(year),
      y = fertility_rate * scale_factor,
      group = age_band,
      colour = age_band
    ),
    size = 0.8,  # Reduced line size for a more subtle appearance
    alpha = 0.6  # Reduced alpha for the line
  ) +
  geom_point(
    data = shared_data,
    aes(
      x = as.factor(year),
      y = fertility_rate * scale_factor,
      group = age_band,
      colour = age_band,
      text = paste0(
        "Year: ", year,
        "<br>Age Band: ", age_band,
        "<br>Fertility Rate: ", round(fertility_rate, 2),
        "<br>Unit: ", uom  # Added the unit of measurement (uom) to the hover tooltip
      )
    ),
    size = 0.5,  # Reduced point size
    alpha = 0.6  # Reduced point alpha for subtlety
  ) +
  scale_fill_manual(values = marital_colors) +  # Apply custom colors for marital status
  scale_colour_manual(values = age_colors) +  # Apply custom colors for age bands
  scale_y_continuous(
    name = "Female Population (thousands)",
    labels = comma
  ) +
  labs(
    title = "Fertility Rates vs Labour Participation",
    subtitle = "Bars: Segmented by Marital Status; Points: Fertility Rates by Age Band",
    x = "Year",
    caption = "Source: SingStat & data.gov.sg"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"  # No legend for now, as we've defined the colors manually
  )

# Plotly conversion (only points have tooltips)
plotly_output <- ggplotly(p, tooltip = "text")  # Ensures the custom text in tooltips is used

# Add vertical annotations for the years without data (1995, 2000, 2005)
plotly_output <- plotly_output |> 
  layout(
    annotations = list(
      list(
        x = 5, y = 0, # for 1995 annotation
        text = "No Data", 
        showarrow = FALSE,
        font = list(size = 12, color = "black", family = "Arial"),
        textangle = 90,  # Rotate text vertically
        xanchor = "center",
        yanchor = "bottom"
      ),
      list(
        x = 10, y = 0, # for 2000 annotation
        text = "No Data", 
        showarrow = FALSE,
        font = list(size = 12, color = "black", family = "Arial"),
        textangle = 90,  # Rotate text vertically
        xanchor = "center",
        yanchor = "bottom"
      ),
      list(
        x = 15, y = 0, # for 2005 annotation
        text = "No Data", 
        showarrow = FALSE,
        font = list(size = 12, color = "black", family = "Arial"),
        textangle = 90,  # Rotate text vertically
        xanchor = "center",
        yanchor = "bottom"
      )
    )
  )

# Final UI Layout with Footnote
tagList(
  div(
    style = "margin-bottom:20px;",
    h3("Interactive Fertility & Labour Plot"),
    p("Use filters to explore age-banded female labour force counts and age-banded fertility rates.")
  ),
  div(
    style = "display: flex; gap: 15px; margin-bottom: 20px;",
    filter_marital,
    filter_labour,
    filter_ageband
  ),
  div(
    style = "margin-bottom: 20px;",
    filter_year  # Add the year slider to the layout
  ),
  plotly_output,
  
  # Footnote Section
  div(
    style = "margin-top: 20px; font-size: 12px; color: #555; text-align: center;",
    p("Population data for 1995, 2000, and 2005 are not available as the Comprehensive Labour Force Survey was not conducted in these years."),
    p("Source: data.gov.sg & SingStat")
  ),
  
  # Custom Legend Section
  HTML(combined_legend)  # Display the custom legend
)

Interactive Fertility & Labour Plot

Use filters to explore age-banded female labour force counts and age-banded fertility rates.

Population data for 1995, 2000, and 2005 are not available as the Comprehensive Labour Force Survey was not conducted in these years.

Source: data.gov.sg & SingStat

Marital Status
Single
Married
Widowed/Divorced
Age Band
15-19
20-24
25-29
30-34
35-39
40-44
45-49

Summary of Improvements to weaknesses

  • Extended Analysis: Compared to the 2019–2023 analysis of the original, our visualisation expands the scope to fertility rates from 1991 to 2022.

  • Socio-Economic Factors: The original visualisation lacked explanatory power for Singapore’s fertility decline. This plot introduces dimensions like female population by age-bands allowing users to evaluate female labour status, and marital status to reveal potential causal relationships between these and fertility rates.

  • Age-Banded Fertility: Instead of a single total fertility rate, this allows the user to inspect age-banded fertility rates for age bands 15–49 in 5-year increments. This highlights how fertility behaviour shifts across cohorts potentially unveiling the reasons behind Singapore’s declining fertility rate.


7.2 Higher Level Dodged Plot

Use this visualisation to inspect aggregated totals of female population and fertility rates that are not separated by age bands.

Show Code
filtered_dataset <- final_dataset |>
  filter(age_band == "All") |>
  mutate(
    marital_status  = tools::toTitleCase(trimws(as.character(marital_status))),
    labour_status   = tools::toTitleCase(trimws(as.character(labour_status))),
    age_band        = tools::toTitleCase(trimws(as.character(age_band))),
    count           = as.numeric(as.character(count)),
    fertility_rate  = as.numeric(as.character(fertility_rate)),
    uom             = ifelse(age_band == "All", "per female", "per thousand females")  # Adding the unit of measurement column
  )

# SharedData for interactivity
shared_data <- SharedData$new(filtered_dataset, group = "full_filter")

# Filter controls
filter_marital <- filter_select("filter_marital", "Marital Status:", shared_data, ~marital_status)
filter_labour  <- filter_select("filter_labour", "Labour Status:", shared_data, ~labour_status)
filter_ageband <- filter_select("filter_ageband", "Age Band:", shared_data, ~age_band)

filter_year <- filter_slider(
  "filter_year", 
  "Select Year Range:", 
  shared_data, 
  ~year, 
  step = 1  # Ensures the slider increments by 1 year at a time
)

# Custom Colors for Marital Status
marital_colors <- c(
  "Single" = "#32CD32",  # Light Green
  "Married" = "#1E90FF",  # Dodger Blue
  "Widowed_divorced" = "#FF4500"  # Orange Red
)

# Custom Colors for Age Bands
age_colors <- setNames(brewer.pal(length(unique(filtered_dataset$age_band)), "Dark2"),
                       unique(filtered_dataset$age_band))

# Scaling factor
scale_factor <- max(filtered_dataset$count, na.rm = TRUE) / max(filtered_dataset$fertility_rate, na.rm = TRUE)

# Generate HTML for the custom legend
age_bands <- na.omit(unique(filtered_dataset$age_band))  # Get unique age bands (excluding NA)

# Split age bands into two columns (first 4 in col1, next in col2)
col1_bands <- head(age_bands, 4)
col2_bands <- tail(age_bands, length(age_bands) - 4)

# Create HTML for both columns
col1_html <- paste0(
  sapply(col1_bands, function(b) {
    sprintf('<div style="display: flex; align-items: center; gap: 5px; margin-bottom: 5px;">
               <div style="width: 15px; height: 15px; background-color: %s; border-radius: 3px;"></div>
               <span>%s</span>
             </div>', age_colors[b], b)
  }), 
  collapse = ""
)

col2_html <- paste0(
  sapply(col2_bands, function(b) {
    sprintf('<div style="display: flex; align-items: center; gap: 5px; margin-bottom: 5px;">
               <div style="width: 15px; height: 15px; background-color: %s; border-radius: 3px;"></div>
               <span>%s</span>
             </div>', age_colors[b], b)
  }), 
  collapse = ""
)

# Combined legend template
combined_legend <- sprintf('
<div style="display: flex; width: 100%%; gap: 15px; margin-top: 20px; font-family: Arial, sans-serif;">
  <!-- Marital Status (1/3 width) -->
  <div style="flex: 1; background-color: #f8f9fa; padding: 5px; border-radius: 5px; font-size: 12px;">
    <div style="text-align: center; font-weight: bold; margin-bottom: 8px;">Marital Status</div>
    <div style="display: flex; flex-direction: column; gap: 6px;">
      <div style="display: flex; align-items: center; gap: 6px;">
        <div style="width: 12px; height: 12px; background-color: #32CD32; border-radius: 3px;"></div>
        <span>Single</span>
      </div>
      <div style="display: flex; align-items: center; gap: 6px;">
        <div style="width: 12px; height: 12px; background-color: #1E90FF; border-radius: 3px;"></div>
        <span>Married</span>
      </div>
      <div style="display: flex; align-items: center; gap: 6px;">
        <div style="width: 12px; height: 12px; background-color: #FF4500; border-radius: 3px;"></div>
        <span>Widowed/Divorced</span>
      </div>
    </div>
  </div>
  
  <!-- Age Bands (2/3 width) -->
  <div style="flex: 2; background-color: #f8f9fa; padding: 5px; border-radius: 5px; font-size: 12px;">
    <div style="text-align: center; font-weight: bold; margin-bottom: 8px;">Age Band</div>
    <div style="display: flex; gap: 15px;">
      <div style="flex: 1;">%s</div>
      <div style="flex: 1;">%s</div>
    </div>
  </div>
</div>
', col1_html, col2_html)

shared_data_lines_points <- SharedData$new(
  filtered_dataset |> filter(!(year %in% c(1995, 2000, 2005))),
  group = "full_filter"
)

# Plot
p <- ggplot() +
  geom_col(
    data = shared_data,
    aes(
      x = factor(year),
      y = count,
      fill = marital_status,  # Segments by marital status
      group = interaction(labour_status, marital_status),  # Ensure the bars are grouped by both labour and marital status
      text = paste0(
        "Year: ", year,
        "<br>Marital Status: ", marital_status,
        "<br>Labour Status: ", labour_status,
        "<br>Female Population: ", comma(count)
      )
    ),
    position = position_dodge(width = 0.7),  # Keep the dodge position
    width = 0.6,
    colour = "white",
    size = 0.2,
    alpha = 0.6  # Reduced alpha for better visibility of segments
  ) +
  geom_line(
    data = shared_data_lines_points,
    aes(
      x = as.factor(year),
      y = fertility_rate * scale_factor,
      group = age_band,
      colour = age_band
    ),
    size = 0.8,  # Reduced line size for a more subtle appearance
    alpha = 0.6  # Reduced alpha for the line
  ) +
  geom_point(
    data = shared_data_lines_points,
    aes(
      x = as.factor(year),
      y = fertility_rate * scale_factor,
      group = age_band,
      colour = age_band,
      text = paste0(
        "Year: ", year,
        "<br>Fertility Rate: ", round(fertility_rate, 2),
        "<br>Unit: ", uom  # Added the unit of measurement (uom) to the hover tooltip
      )
    ),
    size = 0.5,  # Reduced point size
    alpha = 0.6  # Reduced point alpha for subtlety
  ) +
  scale_fill_manual(values = marital_colors) +  # Apply custom colors for marital status
  scale_colour_manual(values = age_colors) +  # Apply custom colors for age bands
  scale_y_continuous(
    name = "Female Population (thousands)",
    labels = comma
  ) +
  labs(
    title = "Fertility Rates vs Labour Participation",
    subtitle = "Bars: Segmented by Marital Status; Points: Fertility Rates by Age Band",
    x = "Year",
    caption = "Source: SingStat & data.gov.sg"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"  # No legend for now, as we've defined the colors manually
  )

# Plotly conversion (only points have tooltips)
plotly_output <- ggplotly(p, tooltip = "text")  # Ensures the custom text in tooltips is used

# Add vertical annotations for the years without data (1995, 2000, 2005)
plotly_output <- plotly_output |> 
  layout(
    annotations = list(
      list(
        x = 5, y = 0, # for 1995 annotation
        text = "No Data", 
        showarrow = FALSE,
        font = list(size = 12, color = "black", family = "Arial"),
        textangle = 90,  # Rotate text vertically
        xanchor = "center",
        yanchor = "bottom"
      ),
      list(
        x = 10, y = 0, # for 2000 annotation
        text = "No Data", 
        showarrow = FALSE,
        font = list(size = 12, color = "black", family = "Arial"),
        textangle = 90,  # Rotate text vertically
        xanchor = "center",
        yanchor = "bottom"
      ),
      list(
        x = 15, y = 0, # for 2005 annotation
        text = "No Data", 
        showarrow = FALSE,
        font = list(size = 12, color = "black", family = "Arial"),
        textangle = 90,  # Rotate text vertically
        xanchor = "center",
        yanchor = "bottom"
      )
    )
  )

# Final UI Layout with Footnote
tagList(
  div(
    style = "margin-bottom:20px;",
    h3("Interactive Fertility & Labour Plot"),
    p("Use filters to explore age-banded female labour force counts and age-banded fertility rates.")
  ),
  div(
    style = "display: flex; gap: 15px; margin-bottom: 20px;",
    filter_marital,
    filter_labour
  ),
  div(
    style = "margin-bottom: 20px;",
    filter_year  # Add the year slider to the layout
  ),
  plotly_output,
  
  # Footnote Section
  div(
    style = "margin-top: 20px; font-size: 12px; color: #555; text-align: center;",
    p("Population data for 1995, 2000, and 2005 are not available as the Comprehensive Labour Force Survey was not conducted in these years."),
    p("Source: data.gov.sg & SingStat")
  ),
  
  # Custom Legend Section
  HTML(combined_legend)  # Display the custom legend
)

Interactive Fertility & Labour Plot

Use filters to explore age-banded female labour force counts and age-banded fertility rates.

Population data for 1995, 2000, and 2005 are not available as the Comprehensive Labour Force Survey was not conducted in these years.

Source: data.gov.sg & SingStat

Marital Status
Single
Married
Widowed/Divorced
Age Band
All

Summary of Improvements to weaknesses

  • Extended Analysis: Compared to the 2019–2023 analysis of the original, our visualisation expands the scope to fertility rates from 1991 to 2022.

  • Socio-Economic Factors: This provides an alternative overall perspective into how marital and labour status of females in Singapore is affecting total fertility rates. This plot is meant to reduce visual clutter by removing the dimension of age-bands.

7.3 Faceted Dashboard

Built using the ggiraph library instead. This dashboard is meant for comparisons of marital status of female populations in and out of labour as well as age-banded fertility rates by using the hovering tooltip, on-click highlighting and lasso selection.

Show Code
# Compute agg_counts and filter out "All" age band
agg_counts <- final_dataset |>
  filter(age_band != "All") |>  # Ensure "All" is excluded here
  group_by(year, marital_status, labour_status) |>
  summarise(count = sum(count, na.rm = TRUE), .groups="drop")


# Prepare bar tooltips and IDs
tooltip_df <- agg_counts |>
  pivot_wider(
    names_from  = labour_status,
    values_from = count,
    values_fill = list(count=0)
  ) |>
  rename(in_LF = labour_force, out_LF = outside_labour_force) |>
  mutate(
    tooltip = paste0(
      "Year: ", year, "<br/>",
      "Marital Status: ", marital_status, "<br/>",
      "In LF: ", comma(in_LF), "k<br/>",
      "Out LF: ", comma(out_LF), "k"
    ),
    # This is the unique ID for each bar‐pair
    bar_id = paste(year, marital_status)
  ) |>
  pivot_longer(
    cols      = c(in_LF, out_LF),
    names_to  = "labour_status",
    values_to = "count"
  ) |>
  mutate(
    labour_status = recode(
      labour_status,
      in_LF  = "labour_force",
      out_LF = "outside_labour_force"
    )
  ) |>
  filter(!is.na(marital_status))  # Ensure that NA is excluded


# Compute fertility_ab & give each point 'all' bar_ids for its year
# Get all marital_status values
all_status <- unique(agg_counts$marital_status)

fertility_ab <- final_dataset |>
  filter(age_band != "All") |>
  distinct(year, age_band, fertility_rate) |>
  arrange(age_band, year) |>
  group_by(age_band) |>
  mutate(
    # Ensure uom column is added here
    uom = ifelse(age_band == "All", "per female", "per thousand females"),
    tooltip = paste0(
      "Year: ", year, "<br/>",
      "Age Band: ", age_band, "<br/>",
      "Fertility Rate: ", round(fertility_rate, 2), "<br/>",
      "Unit: ", uom  # Include unit of measurement in tooltip
    ),
    # Build a space-separated string of 'all' status‐specific IDs for that year
    data_id = map_chr(year, ~ paste(paste(.x, all_status), collapse=" "))
  ) |>
  ungroup()

# Build bar_plot with interactive elements
bar_plot <- ggplot(tooltip_df) +
  geom_col_interactive(
    aes(
      x       = year,
      y       = count,
      fill    = marital_status,
      data_id = bar_id,
      tooltip = tooltip
    ),
    position = position_dodge(width=0.8),
    alpha    = 0.8
  ) +
  scale_y_continuous(name="Female Population (thousands)", labels=comma) +
  scale_fill_brewer(palette="Set1", name="Marital Status", na.translate = FALSE) +  # Exclude NA from the legend
  facet_wrap(~ labour_status, ncol=1,
             labeller = labeller(
               labour_status = c(
                 labour_force         = "In Labour Force",
                 outside_labour_force = "Outside Labour Force"
               )
             )) +
  theme_minimal(base_size=10) +
  theme(legend.position="bottom",
        strip.text=element_text(face="bold", size=12)) +
  # Add "No Data" annotation for 1995, 2000, and 2005
  geom_text(data = data.frame(
    year = c(1995, 2000, 2005),
    count = 0,
    marital_status = "No Data",  # Dummy for annotation
    label = "No Data"
  ), aes(x = year, y = count, label = label),
    angle = 90,  # Vertical text
    vjust = 0,  # Adjust text position
    nudge_y = 80, # up abit
    size = 3, color = "black", fontface = "bold"
  )


# Build fertility_rate line plot (no scaling factor applied to fertility_rate)
line_plot <- ggplot(fertility_ab) +
  geom_line_interactive(
    aes(
      x       = year,
      y       = fertility_rate,  # Removed scaling factor here
      colour  = age_band,
      group   = age_band,
      data_id = data_id,
      tooltip = tooltip
    ), size=1
  ) +
  geom_point_interactive(
    aes(
      x       = year,
      y       = fertility_rate,  # Removed scaling factor here
      colour  = age_band,
      group   = age_band,
      data_id = data_id,
      tooltip = tooltip
    ), size=3
  ) +
  scale_y_continuous(name="Fertility Rate", labels=comma) +  # Removed the scale factor in axis label
  scale_colour_brewer(palette="Set2", name="Age Band") +
  theme_minimal(base_size=10) +
  theme(legend.position="bottom",
        strip.text=element_text(face="bold", size=12))

# Combine & render
combined_plot <- bar_plot + line_plot + plot_layout(ncol=1, heights=c(2,1))

# Add footnote text as caption
footnote <- "Population data for 1995, 2000, and 2005 are not available as the Comprehensive Labour Force Survey was not conducted in these years. Source: data.gov.sg & SingStat"

# Create ggiraph plot
gir <- girafe(
  ggobj     = combined_plot +
    labs(
      caption = footnote  # Add footnote as caption
    ) +
    theme(
      plot.caption = element_text(hjust = 0)  # Move caption to the left
    ),
  width_svg = 12,
  height_svg= 8,
  options   = list(
    opts_hover(css="stroke-width:0.3px;opacity:2;"),
    opts_hover_inv(css="opacity:0.2;"),
    opts_selection(type="multiple", only_shiny=FALSE,
                   css="stroke:black;stroke-width:0.3px;opacity:2;"),
    opts_selection_inv(css="opacity:0.2;"),
    opts_zoom(max=5, min=1)
  )
)

gir

Summary of Improvements:

  • Unified selection IDs: Used a single data_id = paste0(“year_”, year) for both bars and lines, when clicking a year it highlights all elements for that year

  • provides an alternative to the higher level dodged plot which allows user to inspect aggregated totals of female population in and out of labour force and uses age banded fertility rates and not total fertility rate.

Improved interactivity:

  • Changed to single‐click selection (opts_selection(type=“single”)), where user can select multiple data points (bars or points) to inspect. Alternatively they may use the lasso select tooptip for area selection.

  • Added custom tooltip styling, zooming, panning or hovering effect and information display (female population in LF and out LF as well as fertility rates).

8 Statistical Analysis

Show Code
# Calculate correlations between key variables
correlation_data <- final_dataset |>
  filter(age_band == "All") |>
  group_by(year, labour_status) |>
  summarise(
    fertility_rate = first(fertility_rate),
    total_count     = sum(count, na.rm = TRUE),
    .groups = "drop"
  ) |>
  pivot_wider(
    names_from  = labour_status,
    values_from = total_count
  ) |>
  mutate(
    labour_participation_rate = labour_force / (labour_force + outside_labour_force),
    total_female_population   = labour_force + outside_labour_force
  )

# Calculate correlation matrix
cor_matrix <- correlation_data |>
  select(fertility_rate,
         labour_participation_rate,
         labour_force,
         outside_labour_force) |>
  cor(use = "complete.obs") |>
  round(3)

# Turn it into a tibble for gt
cor_tbl <- as.data.frame(cor_matrix) |>
  rownames_to_column(var = "Variable") |>
  as_tibble()

# Render with gt
cor_tbl |>
  gt(rowname_col = "Variable") |>
  tab_header(
    title = md("**Correlation Matrix: Key Variables**")
  ) |>
  fmt_number(
    columns = everything(),
    decimals = 3
  )
# Key correlation insights
cat("• Fertility Rate vs Labour Participation Rate:",
    cor_matrix["fertility_rate", "labour_participation_rate"], "\n")
• Fertility Rate vs Labour Participation Rate: -0.875 
Show Code
cat("• Fertility Rate vs Labour Force:",
    cor_matrix["fertility_rate", "labour_force"], "\n")
• Fertility Rate vs Labour Force: -0.936 
Show Code
cat("• Fertility Rate vs Outside Labour Force:",
    cor_matrix["fertility_rate", "outside_labour_force"], "\n")
• Fertility Rate vs Outside Labour Force: 0.672 
Table 6: Correlation Matrix
Correlation Matrix: Key Variables
fertility_rate labour_participation_rate labour_force outside_labour_force
fertility_rate 1.000 −0.875 −0.936 0.672
labour_participation_rate −0.875 1.000 0.977 −0.932
labour_force −0.936 0.977 1.000 −0.835
outside_labour_force 0.672 −0.932 −0.835 1.000
  • There is a strong negative correlation between fertility rate and both female labour participation and overall female labour force size. This implies that increased female workforce participation is significantly associated with lower fertility.

  • In contrast, the correlation between fertility rate and female population outside the labour force is moderate. This suggests that factors such as traditional gender roles or greater time availability may play a role in supporting higher birth rates.

Trend Analysis

Show Code
# Calculate year-over-year changes
trend_analysis <- final_dataset |>
  filter(age_band == "All") |>
  group_by(year, labour_status) |>
  summarise(
    fertility_rate = first(fertility_rate),
    total_count = sum(count, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(year) |>
  mutate(
    fertility_change = fertility_rate - lag(fertility_rate),
    fertility_pct_change = (fertility_rate - lag(fertility_rate)) / lag(fertility_rate) * 100
  )

# Summary statistics
summary_stats <- trend_analysis |>
  filter(!is.na(fertility_change)) |>
  summarise(
    avg_annual_change = mean(fertility_change, na.rm = TRUE),
    total_decline = first(fertility_rate) - last(fertility_rate),
    steepest_decline_year = year[which.min(fertility_change)],
    steepest_decline_value = min(fertility_change, na.rm = TRUE)
  )

cat("• Average annual fertility decline:", round(summary_stats$avg_annual_change, 4), "per year\n")
• Average annual fertility decline: -0.0107 per year
Show Code
cat("• Total fertility decline (1991-2020):", round(summary_stats$total_decline, 2), "\n")
• Total fertility decline (1991-2020): 0.63 
Show Code
cat("• Steepest decline occurred in:", summary_stats$steepest_decline_year, "\n")
• Steepest decline occurred in: 2001 
Show Code
cat("• Steepest decline value:", round(summary_stats$steepest_decline_value, 3), "\n")
• Steepest decline value: -0.19 
  • The year 1998 emerges as a key inflection point, likely influenced by the Asian Financial Crisis. The crisis introduced economic uncertainty, which may have delayed family planning decisions.

  • Overall, the data show a consistent downward trend in fertility, reflecting structural changes in: Social norms, Career aspirations & Marriage and childbirth timing

9 Key Findings & Insights

9.1 Summary Statistics

The integrated dataset reveals four headline insights:

1. Female labour participation is the dominant macro-driver.
- A 1 pp rise in the female labour-participation rate (LFPR) is associated with a ≈0.022 drop in the total-fertility rate (TFR).

2. The fertility crash is concentrated in the peak 25 – 29 age-band.
- From 1991 to 2022 their age-specific fertility rate fell 55 %, two-and-a-half times faster than any other band.

3. Singlehood magnifies the effect.
- Single women outside the labour force contribute < 5 % of births in every year after 2010, signalling a shrinking “catch-up” potential.

4. 1998 is the structural break.
- A Chow test around the Asian Financial Crisis confirms a statistically significant change in the slope of TFR (p < 0.01).

Table 7: Fertility & Female Labour Participation Rates
Fertility & LFPR by Decade
Decade Mean TFR SD Mean LFPR
1990s 1.64 0.10 0.588
2000s 1.32 0.11 0.634
2010s 1.20 0.05 0.699
2020-22 1.10 NA 0.735

9.1.1 Insights from the table

Trends across decades

  • Steady fertility decline: 1.64 ➜ 1.32 ➜ 1.20 ➜ 1.10. The 2020-22 figure is barely half of the replacement level (≈ 2.1).

  • Rising female participation: 0.588 ➜ 0.634 ➜ 0.699 ➜ 0.735. Roughly 15 percentage-points more women are in paid work than in the 1990s.

  • Greater stability at low fertility: The SD shrinks from 0.11 to 0.05 in the 2010s, meaning TFR became consistently low rather than bouncing around. (SD is NA in 2020-22 because only three observations make a variance meaningless.)

What this suggests

  • Inverse relationship: Every decade that female LFPR rises, TFR falls—mirroring the strong negative correlation (≈ -0.87) we calculated earlier.

  • 1990s → 2000s step-change: The sharpest TFR drop (-0.34) coincides with the first big LFPR jump (+4.6 pp), reinforcing labour-market pressure on family formation.

  • Lock-in of low fertility: Once TFR slipped below 1.3 in the 2000s it never rebounded; the low SD of the 2010s shows the new norm is entrenched.

  • Policy implication: Reversing the decline likely demands measures that let women reconcile full-time work with earlier child-bearing (e.g., affordable childcare, flexible jobs), not merely incentives to have more children

Overall, the table compresses thirty years of data into a single glance: as Singaporean women joined the workforce in ever-greater numbers, fertility fell phase by phase and has now stabilised at record lows.

9.2 Statistical Significance Testing

Pearson r Test: Measures the strength and direction of the linear relationship between two continuous variables. In this case, it quantifies how closely the female labour-force-participation rate (LFPR) and the total fertility rate (TFR) move together in a straight line.

LM Slope (from TFR ~ LFPR): Fits a simple linear regression model to estimate the quantitative impact (slope) of one variable (LFPR) on another (TFR). It tells you the expected change in TFR for a one-unit change in LFPR and assesses the statistical significance of this relationship.

Kruskal–Wallis Test: A non-parametric test used to determine if there are statistically significant differences in the distributions of a continuous variable (fertility rate) across three or more independent groups (age bands).

Chow F (1998 break): Tests for a structural break in a regression model at a known point in time (1998). It checks whether the relationship between TFR and LFPR (the slope and/or intercept of the regression line) changed significantly before and after the Asian Crisis.

p-value is a probability that measures the strength of evidence against a null hypothesis “no effect” obtained from the data (Scribbr, 2025).

Show Code
# Build correlation_data (year-level)
correlation_data <- final_dataset |>                  # uses long data
  filter(age_band == "All") |>                        # total fertility only
  group_by(year, labour_status) |>                    
  summarise(
    fertility_rate = first(fertility_rate),
    total_count    = sum(count, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  pivot_wider(
    names_from  = labour_status,
    values_from = total_count
  ) |> 
  mutate(
    labour_participation_rate = labour_force /
                                (labour_force + outside_labour_force),
    total_female_population   = labour_force + outside_labour_force
  ) |> 
  arrange(year)

# Statistical tests
## Pearson correlation
cor_test <- cor.test(
  correlation_data$fertility_rate,
  correlation_data$labour_participation_rate,
  method = "pearson"
)

## Simple linear model
lm_fit   <- lm(fertility_rate ~ labour_participation_rate, data = correlation_data)
lm_tidy  <- tidy(lm_fit)
lm_glance<- glance(lm_fit)

## Kruskal–Wallis (fertility ~ age_band)
kw_test  <- final_dataset |> 
  filter(age_band != "All") |> 
  kruskal.test(fertility_rate ~ age_band, data = .)

## Chow structural-break test at 1998
chow <- sctest(
  fertility_rate ~ labour_participation_rate,
  type  = "Chow",
  point = which(correlation_data$year == 1998),
  data  = correlation_data
)

# Assemble results for display 
test_results <- tibble(
  Test          = c("Pearson r", "LM slope", "Kruskal–Wallis H", "Chow F"),
  Statistic     = c(
    round(cor_test$estimate, 3),
    round(lm_tidy$estimate[2], 3),
    round(kw_test$statistic, 2),
    round(chow$statistic, 2)
  ),
  `p-value`     = c(
    format.pval(cor_test$p.value),
    format.pval(lm_tidy$p.value[2]),
    format.pval(kw_test$p.value),
    format.pval(chow$p.value)
  ),
  Interpretation = c(
    "Strength & direction of association",
    "Slope of TFR ~ LFPR",
    "Differences across age bands",
    "Structural break at 1998"
  )
)

gt(test_results)
Table 8: Statistical Significance Test Results
Test Statistic p-value Interpretation
Pearson r -0.875 0.0000000023151 Strength & direction of association
LM slope -3.447 0.0000000023151 Slope of TFR ~ LFPR
Kruskal–Wallis H 6515.850 < 0.000000000000000222 Differences across age bands
Chow F 37.310 0.000000060209 Structural break at 1998

Pearson r Test - A very strong inverse linear association between female labour-force-participation rate (LFPR) and total fertility rate (TFR). As women’s LFPR rises, TFR falls almost in lock-step.

LM Slope - From the simple regression TFR ~ LFPR. A one-unit (i.e., 100 percentage-point) rise in LFPR would predict a 3.447 drop in TFR; scaled more realistically, each 10 pp rise in LFPR decreases TFR by about 0.345. The slope is highly significant, confirming that LFPR is a strong linear predictor of fertility.

Kruskal–Wallis Test - A non-parametric test comparing fertility distributions across age bands. The enormous H statistic and vanishingly small p-value tell us the age groups do not share a common median fertility rate—fertility behaviour differs sharply by age band.

Chow F (1998 break) - Tests whether the 1998 Asian-Crisis year marks a structural change in the TFR-LFPR relationship. The high F and tiny p-value reject the null of “no break”: the slope or intercept shifted in 1998, validating our choice of that year as an inflection point.

9.2.1 Key takeways

  • Strength of relationship – Both Pearson r and the LM slope are large in magnitude and highly significant, underscoring that labour-market participation is the dominant quantitative correlate of fertility decline.

  • Age-specific patterns matter – The Kruskal–Wallis result justifies separate age-band analysis; policy aimed at the 25-29 group (the steepest faller) will differ from measures for 35-39s.

  • Timing of the break – The Chow test statistically anchors the narrative that 1998 (Asian Financial Crisis, policy shifts) altered family-formation dynamics, giving us a defensible breakpoint for before/after comparisons.

  • Practical vs. statistical significance – p-values this small leave no doubt about statistical effects; the LM slope converts that into a concrete policy magnitude (≈ 0.035 drop in TFR for every 1 pp rise in LFPR).

Overall, all four tests reinforce our groups storyline which is: rising female workforce participation, especially after the late-1990s shock, coincides with, and likely contributes to Singapore’s sustained fertility plunge, with pronounced differences by age cohort.


10 Team Contributions

Team Member Tasks
Guo Zi Qiang Robin Data Engineering, references, creation of dodged & faceted visualisation
Chew Tze Han
Cheong Wai Hong Jared Supported the data visualisation, Statistical Analysis, Key Findings & Insights, Summary Statistics, Statistical Significance Testing
Akram
Gregory Tan

References

Data.gov.sg. (n.d.a). Data.gov.sg. https://staging.data.gov.sg/datasets?query=household&page=1&searchColumns=Year&resultId=d_e19478b30d8f5cd6a1dc482bf2e46eb7
Data.gov.sg. (n.d.b). Data.gov.sg. https://staging.data.gov.sg/datasets?query=household&page=1&searchColumns=Year&resultId=d_e2475676af29ec78749f1b22cf8b301c
Scribbr. (2025). Statistics. https://www.scribbr.com/category/statistics/
Statistics, S. D. of. (n.d.). Population by age group, sex and type of locality, 2023 [table M810091]. https://tablebuilder.singstat.gov.sg/table/TS/M810091
Tan, T. (2024a). Singapore’s total fertility rate hits record low in 2023, falls below 1 for first time. The Straits Times. https://www.straitstimes.com/singapore/politics/singapore-s-total-fertility-rate-hits-record-low-in-2023-falls-below-1-for-first-time
Tan, T. (2024b). Why the fertility rate doesn’t capture socio-economic or cultural trends. The Straits Times. https://www.straitstimes.com/singapore/why-the-fertility-rate-doesn-t-capture-socio-economic-or-cultural-trends